We can think of a logistic regression as model in which individuals have a latent trait that determines, given a threshold, if they fall in one category or another. For instance, latent scholastic aptitude is a trait that (co-) determines if a student passes a class:
threshold = -2
set_par(cex = 1.5)
curve(dlogis(x),-6,6, xlim = c(-5,5),
xlab = "latent scholastic aptitude")
x = seq(threshold,5,.1)
shade(dlogis(x)~x,c(threshold,6), col = adjustcolor("green3",.5))
x = seq(-6,threshold,.1)
shade(dlogis(x)~x,c(-6,threshold), col = adjustcolor("red3",.5))
abline(v = threshold, lty = 2, lwd = 2)
In this model, the latent variable is distributed according to the logistic distribution.
We can calculate the log odds of an outcome as before, this time using probabilities.
\[ \textrm{log odds} = log \left( \frac{P_{fail}}{P_{pass}} \right) = log \left( \frac{P_{fail}}{1-P_{fail}} \right) \] To get the cumulative probability to fail at a certain threshold, we use the cumulative probability function of the logistic distribution, also called the inverse logit.
In R we can use boot::inv.logit or plogis
to get the cumulative density function of the logistic distribution.
set_par(cex = 1.5)
curve(plogis(x),-6,6,xlim = c(-5,5),
xlab = "latent scholastic aptitude",
ylab = "plogis(x) = inv_logit(x)")
curve(plogis(x),-6,-2,col = "red", add = T, lwd = 1.5)
curve(plogis(x),-2,6,col = "green3", add = T, lwd = 1.5)
lines(c(-6,threshold),rep(plogis(threshold),2), lty = 2, col = "red")
lines(rep(threshold,2),c(0,plogis(threshold)), lty = 2, col = "red")
P_fail = plogis(threshold)
P_pass = 1-P_fail
log_odds = log(P_fail/P_pass)
log_odds
## [1] -2
This is exactly our threshold and explains why the intercept
coefficient of an intercept-only logistic regression for data with a
fail-rate of 12% will be qlogis(.12) = -2. The following
figure shows that we can use the cumulative distribution function
(plogis or inv.logit) to go from a threshold
value to success probabilities and the quantile function
(qlogis or logit) to go from cumulative
success probabilities to thresholds.
set_par(mfrow = c(1,2), cex = 1.5)
curve(plogis(x),-5,5, xlab = "threshold", ylab = "plogis(x) or inv.logit(x)")
arrows(x0 = threshold, y0 = 0, y1 = plogis(threshold), col = "blue", lwd = 2)
arrows(x0 = threshold, x1 = -5.4, y0 = plogis(threshold), col = "blue", lwd = 2)
text(-5.4,plogis(threshold),labels = round(plogis(threshold),2), pos = 2, col = "blue", xpd = TRUE, cex = .75)
curve(qlogis(x),0,1, xlab = "cumulative probability", ylab = "qlogis(x) or logit(x)")
arrows(x0 = plogis(threshold), y0 = -5, y1 = threshold, col = "blue", lwd = 2)
arrows(y0 = threshold, x0 = plogis(threshold), x1 = -0.04, col = "blue", lwd = 2)
text(-5.4,plogis(threshold),labels = round(plogis(threshold),1), pos = 2, xpd = TRUE)
text(plogis(threshold),-5.4,labels = round(plogis(threshold),2), pos = 1, col = "blue", xpd = TRUE, cex = .75)
Now let us assume that there is another variable \(\small X\) that increases the probability to pass a class, so that also students with lower latent scholastic aptitude of -3 can pass a class (could be class difficulty).
In a logistic regression we estimate the effect of \(\small X\) as follows:
\[ log \left( \frac{P_{fail}}{P_{pass}} \right) = \alpha - \beta \cdot X \]
Here, \(\small \alpha\) is the intercept and baseline log-odds to pass the test (due to latent scholastic aptitude) and \(\small \beta\) captures log odds ratio from passing the class due to \(\small X\).
Because our baseline threshold / log odds is -2 and the threshold for children with variable X is -3, we calculate
\[ \beta = -2 -(-3) = 1 \]
So if we simulate data according to this data generating process and estimate a logistic regression, we should find that \(\small \alpha = -2\) and \(\small \beta = 1\).
Lets simulate the data:
set.seed(1)
N = 10000
X = rep(c(0,-1),N/2) %>% sort()
thresholds = -2 + X
pass = rlogis(N) > thresholds
We estimate the model:
q.fit = quap(
alist(
pass ~ dbinom(1,p),
logit(p) ~ -(a + b*X),
a ~ dnorm(0,3),
b ~ dnorm(0,3)
),
data = list(pass = pass, X = X)
)
precis(q.fit) %>%
round(2)
## mean sd 5.5% 94.5%
## a -2.03 0.04 -2.10 -1.96
## b 0.87 0.08 0.75 1.00
Due to simulation noise we are not exactly recovering the parameters, but we are getting very close to the correct thresholds with -2.03 at baseline and a + b = -2.03 + 0.87 = -1.16 for individuals where X = 1.
This shows us again that regression weights in logistic regressions represent changes in log odds ratios, and thus also changes in the threshold, due to a predictor variable.
The ordinal logistic regression is named as such, because just like logistic regression, it assumes a latent logistic variable that determines responses. Differently than standard logistic regression, ordered logistic regression uses multiple thresholds, which are used to map the latent variable onto ordered responses:
thresholds = c(-1.24,.25,2)
plot_dlogis.thresh = function(threshs, xlim = c(-5,5), plot.text = TRUE) {
threshs = as.numeric(threshs)
x.st = xlim[1]-1
x.en = xlim[2]+1
curve(dlogis(x),x.st,x.en,xlim = xlim,
xlab = "latent scholastic aptitude")
n_cat = length(threshs) + 1
clrs =
colorRampPalette(c("blue","blue4"))(n_cat)
for (k in 1:n_cat) {
st = ifelse(k == 1, x.st, threshs[k-1])
en = ifelse(k == n_cat, x.en, threshs[k])
x = seq(st,en,length.out = 50)
polygon(c(min(x),x,max(x)), c(0,dlogis(x),0), col = clrs[k])
arrows(x0 = threshs[k], y0 = 0, y1 = dlogis(0), lty = 2, col = "red", lwd = 2,length = 0)
if (plot.text == TRUE) {
text((st+en)/2,dlogis(0)/2,paste0("R=",k), col = "grey50", cex = 1.5)
text(threshs[k],dlogis(0),round(threshs[k],2), col = "red", pos = 2)
}
}
}
n_cat = length(thresholds) + 1
clrs = colorRampPalette(c("blue","blue4"))(n_cat)
set_par(cex = 1.5)
plot_dlogis.thresh(thresholds)
Just as we can use the (cumulative) logistic distribution to recover threshold values (log odds) for the simple logistic regression, we can use it to recover thresholds from an ordered logistic model.
For instance, when 22.4% of individuals respond with R = 1:
threshold_1 = log(.224/(1-.224))
threshold_1 %>% round(2)
## [1] -1.24
As you might have guessed, we can also read the probability of different responses from the cumulative logistic distribution:
plot_plogis.thresh = function(thresholds, plot.thrsh.eq = FALSE, xlim = c(-5,5)) {
x.st = xlim[1]-1
x.en = xlim[2]+1
n_cat = length(thresholds) + 1
clrs =
colorRampPalette(c("blue","blue4"))(n_cat)
curve(plogis(x),x.st,x.en,xlim = c(-5,5),
xlab = "latent scholastic aptitude")
for (k in 1:n_cat) {
s = ifelse(k == 1, x.st, thresholds[k-1])
e = ifelse(k == n_cat, x.en, thresholds[k])
curve(plogis(x), s,e, col = clrs[k], add = T, lwd = 2)
b = ifelse(k == 1, 0, plogis(thresholds[k-1]))
t = ifelse(k == n_cat, 1, plogis(thresholds[k]))
x = seq(s,e,.01)
y.p = c(plogis(x), plogis(max(x)))
x.p = c(x,x.en)
polygon(c(x.p,x.en),c(y.p,plogis(x[1])), col = clrs[k], border = "NA")
arrows(x0 = -4, y1 = t, y0 = b, angle = 90, code = 3, length = .15)
prob.level =
ifelse(
k == 1,
plogis(thresholds[k]),
ifelse(
k == n_cat,
1-plogis(thresholds[k-1]),
plogis(thresholds[k])-plogis(thresholds[k-1]))) %>%
round(2)
text(-4, (b+t)/2, paste0("P(R=",k,")=", prob.level), pos = 4)
if (k < n_cat) {
p=plogis(thresholds[k])
thrsh = log(p/(1-p)) %>% round(2)
thrsh.equ = bquote(frac(.(round(p,2)),.(round(1-p,2)))~"=exp("~.(thrsh)~")")
if (plot.thrsh.eq == TRUE) text(5.5,p,thrsh.equ, xpd = T, pos = 4)
arrows(x0 = thresholds[k], y0 = 0, y1 = p, col = "red", lty = 2)
arrows(x0 = thresholds[k], x1 = -5.4, y0 = p, col = "red", lty = 2)
}
}
points(thresholds,rep(0,n_cat-1), pch = "|", col = "red")
}
set_par(cex = 1.5, mar = c(3,3,.5,7))
plot_plogis.thresh(thresholds, plot.thrsh.eq = TRUE)
Let’s again simulate data from the model and estimate the thresholds.
N = 1000
R = cut(
rlogis(1000), # latent variable
breaks = c(-Inf,thresholds,Inf)) %>%
as.numeric()
which we can show as histogram and as cumulative counts:
set_par(mfrow = c(1,2))
R %>%
table() %>%
barplot(ylab = "N",
col = clrs,
xlab = "Response")
m = diag(4)
m[upper.tri(m)] = 1
m = apply(m,2, function(x) x*R %>% table())
x = barplot(m, col = clrs,
ylab = "cumulative N",
xlab = "Response",
names.arg = 1:n_cat)
ps = c(plogis(thresholds[1]),diff(c(plogis(thresholds),1)))
cs = c(0,plogis(thresholds),1)
for (k in 1:n_cat)
text(tail(x,1),
mean(cs[k:(k+1)])*N,
paste(round(ps[k]*100),"%"),
col ="white",
cex = .5)
Here is an ordered logistic model:
bol.model =
alist(
R ~ dordlogit(0,thresholds),
thresholds ~ dnorm(0,3)
)
The key difference between the simple logistic and the cumulative ordered logistic regression is that the former has only one threshold parameter:
\[ log \left( \frac{P(fail)}{P(success)} \right) = \alpha \]
and the latter has \(\small \textrm{1-number of levels}\) threshold parameters:
\[ log \left( \frac{P(R>\alpha_k)}{P(R<\alpha_k)} \right) = \alpha_k \]
The dordlogit function in the rethinking package
calculates the likelihood of the observations given the model
using the plogis / inv_logit function, whereby
different threshold values lead to different likelihoods:
set_par(mfrow = c(2,2))
thresh_list = list(thresholds,thresholds-1.25)
for (k in 1:2) {
thres = thresh_list[[k]]
plot_plogis.thresh(thres)
rbind(R %>%
table %>%
prop.table(),
rlogis(100) %>%
cut(c(-Inf,thres,+Inf)) %>%
table %>% prop.table()) %>%
barplot(beside = T,
col = c("grey50","blue"),
ylab="proportion",
xlab = "Response")
legend("topleft", fill = c("grey50","blue"),
legend = c("data",expression("model | "~theta)), bty = "n")
}
Moving the thresholds changes the probability / likelihood to observe
the different categories.
The basic ordered logistic regression model without predictors finds the threshold values that allow to reproduce the observed distribution of ratings.
We fit the model.
bol.fit = ulam(
bol.model,
data=list(R=R),
chains=4,cores=4)
And here are the estimated thresholds, with the true thresholds as vertical, dotted, blue lines.
precis(bol.fit, depth = 2) %>% plot()
threshold.est = precis(bol.fit, depth = 2)[,1]
text(threshold.est,
3:1,
round(threshold.est,2), pos = 3)
abline(v = thresholds,col = "blue", lty = 3)
As expected, we recover the correct threshold values (with some error/bias due to simulation and prior).
To understand how to set up a model, such that a variable can increase or decrease the probability of higher response levels, lets look at the plot of latent variable and thresholds again. The plot also shows a hypothetical person with an average latent scholastic aptitude of 0 as a white dot.
set_par(cex = 1.5)
plot_dlogis.thresh(thresholds)
plotrix::draw.circle(0,.005,radius = .1,col = "white")
If we want to increase the chance that this hypothetical person achieves a rating of R = 3, we can either increase the value of the latent variable or shift the the thresholds to the left:
set_par(mfrow = c(1,2),cex = 1.25)
plot_dlogis.thresh(thresholds)
arrows(x0 = 0, x1 = 1, y0 = .005, length = .1, col = "yellow")
plotrix::draw.circle(1,.005, radius = .1,col = "white")
plotrix::draw.circle(0,.005, radius = .1,col = adjustcolor("white",alpha = .75))
thresholds.b = thresholds - 1
plot_dlogis.thresh(thresholds.b)
abline(v = thresholds, col = adjustcolor("red",alpha = .25), lty = 2)
arrows(x0 = thresholds, x1 = thresholds.b,
y0 = c(seq(.025,.075,length.out = 3)),
col = "yellow", length = .1)
plotrix::draw.circle(0,.005,radius = .1,col = "white")
If we shift the latent value or the thresholds by the same amount, the probability of a higher response rises equally. In both plots above the bright white dot is in in the middle between the 2/3 and 3/4 thresholds.
Therefore we can, as we saw earlier for the logistic regression, add terms for individual level effects to the basic threshold / intercept only model to capture the effect of predictor variables on responses:
\[ log \left( \frac{P_i(R>\alpha_k)}{P_i(R<\alpha_k)} \right) = \alpha_k - \beta \cdot X_i \] where \(\small \alpha_k\) are thresholds and \(\beta\) captures the effect of the variable \(X\) by modifying each individual’s threshold.
Now we have seen previously that a threshold is just the log odds of responding in a category above vs below the threshold. Therefore, a positive (negative) regression weight in an ordinal logistic regression should be interpreted as the log odds ratio to give a one level higher (lower) response, i.e. R = 4 instead of R = 3 (R = 3 instead of R = 4).
Because there is only one regression weight \(\small \beta\) per variable that is added all thresholds, the log odds ratios are the same for all category transitions, i.e.
\[ log \left( \frac{P_i(R>\alpha_1|X = 0)}{P_i(R>\alpha_1|X = 1)} \right) = log \left( \frac{P_i(R>\alpha_2|X = 0)}{P_i(R>\alpha_2|X = 1)} \right) = log \left( \frac{P_i(R>\alpha_3|X = 0)}{P_i(R>\alpha_3|X = 1)} \right) \]
This is referred to as the proportional odds assumptions, which may or may not be true for the data at hand. There are alternative models that do not make this assumption. Here is a good paper that describes these models: Ordinal Regression Models in Psychology: A Tutorial
To see the ordered logistic regression in action, we’ll use the example data from the Portuguese school1. In particular, we are looking at the final grade and estimate again the association with maternal education.
df=read.table("../Chapter11/data/student-mat.csv",sep=";",header=TRUE)
df = df[df$Medu>0,]
set_par()
df$G3 = as.numeric(cut(df$G3, breaks = seq(0,20,2),include.lowest = T))
df$G3 %>% table() %>% barplot()
Yes, ordinal logistic regressions can also handle multimodal data!
To recapitulate that thresholds are just log odds at category boundaries, lets calculate them:
# 1. Cumulative probability of all classes
cum_prob = df$G3 %>% table() %>% prop.table() %>% cumsum()
cum_prob = cum_prob[cum_prob!=1]
# calculate logg odds
simple_thresholds = log(cum_prob/(1-cum_prob))
simple_thresholds %>% round(2)
## 1 2 3 4 5 6 7 8 9
## -2.23 -2.20 -1.69 -1.04 -0.11 0.71 1.51 2.73 4.16
And we estimate a thresholds only model with ulam:
tm.fit = ulam(
alist(
G3 ~ dordlogit(0,thresholds),
thresholds ~ dnorm(0,3)),
data=list(G3 = df$G3),
chains=4,cores=4)
Now we can plot the manually calculated thresholds against those
estimated with ulam:
post = extract.samples(tm.fit)
ulam_thresholds = colMeans(post$thresholds)
CI = apply(post$thresholds, 2, PI)
set_par(cex=1.5)
plot(simple_thresholds,
ulam_thresholds,
ylim = range(CI),
pch = 16, col = "blue")
arrows(x0 = simple_thresholds,
y0 = CI[1,], y1 = CI[2,],
col = "blue", length = 0)
abline(0,1,lty = 3, col = "grey")
Now that we have understood thresholds, we can implement a model that uses predictors. Specifically, we estimate the effect of past failure and maternal education on grades and allow an interaction between these two variables. Ordinal logistic regressions allow differences in the magnitude of slopes between groups without modelling interactions explicitly. But we do need to model interactions explicitly if we want to allow slopes with different signs for different groups.
ol.model =
alist(
G3 ~ dordlogit(phi,thresholds),
phi <- bM*(Medu-2.5) + iFE*failures,
iFE <- bF + bFE*(Medu-2.5),
c(bF, bM, bFE) ~ normal(0,1),
thresholds ~ dnorm(0,3)
)
We fit the model.
ol.fit = ulam(
ol.model,
data=list(G3 = df$G3, Medu = df$Medu, failures = df$failures),
chains=4,cores=4)
We first just check Rhat values to make sure the model converged:
precis(ol.fit, depth = 2) %>% round(3)
## mean sd 5.5% 94.5% n_eff Rhat4
## bFE -0.199 0.120 -0.390 -0.009 1475.037 0.999
## bM 0.331 0.090 0.183 0.467 1559.750 1.000
## bF -0.376 0.292 -0.850 0.091 1428.148 0.999
## thresholds[1] -1.899 0.306 -2.400 -1.410 1606.995 1.001
## thresholds[2] -1.837 0.304 -2.329 -1.353 1639.873 1.000
## thresholds[3] -1.266 0.283 -1.717 -0.811 1569.978 1.000
## thresholds[4] -0.539 0.267 -0.959 -0.107 1628.050 1.000
## thresholds[5] 0.523 0.269 0.101 0.954 1578.267 1.000
## thresholds[6] 1.436 0.280 0.986 1.882 1539.712 1.001
## thresholds[7] 2.293 0.295 1.819 2.763 1495.422 1.001
## thresholds[8] 3.561 0.345 3.008 4.105 1752.300 1.001
## thresholds[9] 5.047 0.487 4.317 5.830 2675.234 0.999
This looks OK.
As a quick posterior predictive check we compare the histogram of the
observed and predicted grades. We use the sim function
instead of the link function to get posterior predictions
on the scale of the ordered outcome variable.
set_par()
pp = sim(ol.fit)
obs.counts = cut(df$G3, breaks = seq(.5,10.5,1)) %>% table()
plot(0:10,c(0,obs.counts), 'S', xaxt = "n", ylim = c(0,110), ylab = "grade")
axis(1,at = (1:10)-.5, labels = 1:10, lwd = 2)
for (k in 1:250) {
pp.counts = cut(pp[k,], breaks = seq(.5,10.5,1)) %>% table()
lines(0:10, c(0,pp.counts), 'S', col = adjustcolor("blue",alpha = .25))
}
lines(0:10,c(0,obs.counts), 'S', lwd = 2)
We can see that the ordinal logistic model has no problems modeling a
multimodal response distribution.
Lets quickly compare the thresholds with thresholds from a thresholds only model again:
post = extract.samples(ol.fit)
ulam_thresholds = colMeans(post$thresholds)
CI = apply(post$thresholds, 2, PI)
set_par(cex=1.5)
plot(simple_thresholds,
ulam_thresholds,
ylim = range(CI),
pch = 16, col = "blue")
arrows(x0 = simple_thresholds,
y0 = CI[1,], y1 = CI[2,],
col = "blue", length = 0)
abline(0,1,lty = 3, col = "grey")
Because the model now also estimates effects of education and past failures, the thresholds are a bit different2. Only if the the covariates would be independent of grades would we expect to see identical thresholds.
Now lets look at the coefficients of interest:
coeffs = precis(ol.fit, pars = c("bM","bF","bFE"))
coeffs %>% plot()
coeffs %>% round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## bM 0.33 0.09 0.18 0.47 1559.75 1
## bF -0.38 0.29 -0.85 0.09 1428.15 1
## bFE -0.20 0.12 -0.39 -0.01 1475.04 1
What do these results mean?3
It is a bit hard to interpret these results because they are on the log odds ratio scale and because it is not totally clear (to me) how to interpret interaction effects. Therefor, we use posterior predictions to understand the results better.
First, we create posterior predictions for all levels of maternal education and past fails:
sim.data = expand.grid(
Medu = sort(unique(df$Medu)),
failures = sort(unique(df$failures)))
pp = sim(ol.fit,data = sim.data, n = 2000)
Here is a general overview:
mu = colMeans(pp)
CI80 = apply(pp,2,function(x) PI(x, prob = .8))
CI90 = apply(pp,2,function(x) PI(x, prob = .9))
set_par(cex = 1.5)
plot(0,type = "n", xlim = c(0.5,4.5),
ylim = range(CI90), xaxt = "n",
xlab = "previous fails",
ylab = "grade")
axis(1,at = 1:4, labels = 0:3)
for (Medu in 1:4) {
idx = sim.data$Medu == Medu
x = (1:4)+(Medu-2.5)/6
points(x,mu[idx], pch = 16, col = Medu)
arrows(x0 = x, y0 = CI80[1,idx], y1 = CI80[2,idx], length = 0, col = Medu, lwd = 2)
arrows(x0 = x, y0 = CI90[1,idx], y1 = CI90[2,idx], length = 0, col = Medu)
}
legend("topright", pch = 16, col = 1:4, legend = 1:4, title = "Medu", bty = "n")
This plot suggests several questions:
Lets first look at the data for which we simulated outcomes:
sim.data
## Medu failures
## 1 1 0
## 2 2 0
## 3 3 0
## 4 4 0
## 5 1 1
## 6 2 1
## 7 3 1
## 8 4 1
## 9 1 2
## 10 2 2
## 11 3 2
## 12 4 2
## 13 1 3
## 14 2 3
## 15 3 3
## 16 4 3
Above, we used sim(ol.fit) to easily obtain predicted
response values. However, these are a bit noisy, because thy are
simulated randomly given the phi and the thresholds. We can be more
accurate by using only phi, and thresholds to calculate
response-probabilities and then calculate expected responses by
weighting the responses with these probabilities. This does not involve
simulation of individual level responses and is therefore less
noisy:
phi = link(ol.fit,data = sim.data)$phi
thresholds = extract_post_ulam(ol.fit)$thresholds
pp = matrix(NA, nrow = nrow(thresholds), ncol = nrow(sim.data))
for (k in 1:nrow(sim.data)) {
log_odds_ratio = apply(thresholds,2, function(x) x - phi[,k])
cum_prob = apply(log_odds_ratio,1, function(x) plogis(x)) %>% t()
cum_prob = cbind(cum_prob,1)
res_prob = cum_prob - cbind(0,cum_prob[,1:(ncol(cum_prob)-1)])
pp[,k] = apply(res_prob,1, function(x) sum(x*(1:10))) # expected grade
}
To answer the first question, “Do grades decrease from 0 to 3 previous fails?”, we calculate (note that the minimum number of previous fails is 0):
\[ \delta = \frac{1}{3}\sum^i_{1:3} G3(fails = i) - G3(fails = i-1) \]
posterior_hist = function(x,label, set_par = TRUE, xlim = NULL) {
if (set_par == T) set_par(cex = 1.5, mar=c(3,3,3,.5))
tlt = paste0(round(mean(x),2),", (",paste(round(PI(x),2), collapse = ", "),
")\n P(",label,">0) = ", round(mean(x>0),3))
hist(x, ylab = label, xlim = xlim, xlab = label,
main = tlt, cex.main = 1)
abline(v = 0, col = "red", lty = 3)
abline(v = (PI(x)), col = "blue", lwd = 2)
}
delta = rep(0,nrow(pp))
# pp is a matrix with posterior predictions
# for the data in sim.data
for (i in 1:3) { # i is number of previous fails
delta = delta +
pp[,which(sim.data$failures == i)] -
pp[,which(sim.data$failures == i-1)]
}
delta = delta / 3
posterior_hist( # un-hide previous code block for this function
delta,
"delta fails",
xlim = range(delta))
To answer the second question, “Is higher maternal education associated with higher grades when there were no previous fails and with lower grades when there were 2-3 fails?”, we first calculate the effect of maternal education in students with 0 and 2 or 3 previous fails:
# calculate contrasts
delta_F0 = rep(0,nrow(pp))
delta_F3 = rep(0,nrow(pp))
# we loop over maternal education levels 2-4
# each time calculating the difference to the next lower level
for (i in 2:4) { # here, i is the maternal education level
delta_F0 = delta_F0 +
pp[,which(sim.data$Medu == i & sim.data$failures == 0)] -
pp[,which(sim.data$Medu == i-1 & sim.data$failures == 0)]
delta_F3 = delta_F3 +
pp[,which(sim.data$Medu == i & sim.data$failures == 3)] -
pp[,which(sim.data$Medu == i-1 & sim.data$failures == 3)]
}
delta_F0 = delta_F0 / 3
delta_F3 = delta_F3 / 3
We see that whereas higher maternal education is associated with better grades when the student had not failed the class yet, there is very weak evidence for the opposite effect for students who failed the class 2-3 times already. To see if the two effects are really different, we need to compare them explicitly, which we do next:
And next we can calculate the difference between these two delta:
interaction_eff = delta_F0-delta_F3
posterior_hist(
interaction_eff,
"delta(fails=0)-delta(fails=3)",
xlim = range(interaction_eff))
We are reasonably certain that there is a difference in slopes, but we can’t be sure.
The ordered probit uses the standard normal distribution for the latent variable. The only disadvantage I see with the ordered probit is that the coefficients cannot be interpreted as (log) odds ratios.
However, one can then interpret regression coefficients as changes on the level of the latent variable on the scale of a standard normal distribution. This can be more attractive, if one is primarily interested in the latent variable. If one is primarily interested in the relative frequency of the response categories the ordered logistic model is easier to interpret.
P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. The data can be downloaded here↩︎
Not centering predictor variables leads to additional differences↩︎
remember that because thresholds are changed as \(\small\alpha - \beta X\) positive values of \(\small \beta\) mean that higher positive values x move the thresholds to the left and thus make higher-level responses more likely.↩︎